Eugenio Del Prete, Ana Margarida Campos, Fabio Della Rocca, Angelo Fontana, Genoveffa Nuzzo and Claudia Angelini
last revised on 28/03/2022
This is the documentation of the ADViSELipidomics package. ADViSELipidomics is a novel Shiny app for the preprocessing, analysis, and visualization of lipidomics data. It copes with the outputs from LipidSearch and LIQUID for lipid identification and quantification, and with data available from the Metabolomics Workbench. ADViSELipidomics extracts information by parsing lipid species (using LIPID MAPS classification) and, together with information available on the samples, allows performing several exploratory and statistical analyses. In the presence of internal lipid standards in the experiment, ADViSELipidomics can normalize the data matrix, providing absolute values of concentration per lipid and sample. Moreover, it allows the identification of differentially abundant lipids in simple and complex experimental designs, dealing with batch effect correction.
If you use ADViSELipidomics in your publications, I am appreciated if you can cite:
E. Del Prete et al. (2022) ADViSELipidomics: a workflow for the analysis of lipidomics data DOI: …………
ADViSELipidomics is a stand-alone Shiny application developed in RStudio IDE (RStudio > 1.4) and implemented using the R language (R > 4.0), available at the following GitHub page: https://github.com/ShinyFabio/ADViSELipidomics. ADViSELipidomics is multi-platform. We tested its functionalities on the main operating systems: Windows 10, Windows 11, macOS 12, Ubuntu 18, Ubuntu 20. The user must first install R (https://www.r-project.org) and R studio (https://www.rstudio.com), if not yet available. Then, before installing ADViSELipidomics, the user might need to perform a few supplementary steps that depend on the operating systems:
Windows Install Rtools, a collection of tools necessary for building R packages in Windows, available at the following link: https://cran.r-project.org/bin/windows/Rtools
MacOS The following code should be written in the console:
brew install imagemagick@6
brew install cairo
If you are on Ubuntu run the following codes in the console:
sudo apt install build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev
sudo apt-get install libcairo2-dev
sudo apt-get install libxt-dev
sudo apt install libmagick++-dev
sudo apt-get install libc6
sudo apt-get install libnlopt-dev
Then, for all the operating systems, ADViSELipidomics can be installed typing the following code in the RStudio console:
if(!require("devtools")){
install.packages("devtools")
}
library(devtools)
install_github("ShinyFabio/ADViSELipidomics")
We kindly suggest updating all the R packages requested during the installation process of ADViSELipidomics Shiny application. Finally to execute ADViSELipidomics the user can type the following code in the RStudio console:
library(ADViSELipidomics)
run_ADViSELipidomics()
Finally, when a new ADViSELipidomics version is released, it can be updated with the same code for the installation.
ADViSELipidomics allows the user to import files concerning different types of data: * LipidSearch or LIQUID. ADViSELipidomics deals with the data files containing information on chromatographic peak area or peak intensity per lipid, obtained as output from external software for identifying and quantifying lipids (i.e., ADViSELipidomics currently supports the output formats from LipidSearch or LIQUID). Moreover, it requires the Target File with details on samples (such as treatments or biological replicates), the Internal Reference File with bounds for the filtering step in the following modules. ADViSELipidomics shows a quality plot based on the sum of chromatographic peak area per sample (or replicate). In the case of LipidSearch output associated with internal lipid standards, ADViSELipidomics also requires all the Calibration Files for the construction of the calibration curves. * Metabolomics Workbench. ADViSELipidomics can download in real-time suitable selected lipidomic experiments from the online repository; * Excel. The user can upload two Excel files: the data matrix and the Target File; * SummarizedExperiment. The user can upload a SummarizedExperiment R object (SE), with several types of information (data matrix, information on lipids, information on samples, metadata if available).
Hence, as can be seen, ADViSELipidomics requires different files that may change between the different types of data. To sum up, here is a list with all the required files for each data type:
For Metabolomics Workbench you don’t need to import anything, just choose the Metabolomics Workbench ID study.
Before running ADViSELipidomics make sure that you have all the required files and that they are compiled properly. Apart from the output files from LipidSearch and LIQUID, ADViSELipidomics requires that the Excel files have a given structure with some mandatory columns. Here we provide a guide to the creation of these Excel files.
The output of LipidSearch and LIQUID are some text files containing information on chromatographic peak area or peak intensity per lipid. If your data come from LipidSearch you should have a deuterated file and a non-labeled file for each sample (or replicate). The extension of these files should be .txt. If your data come from LIQUID you can have a positive and a negative file, with a .tsv extension. In any case, put your data file in a folder and rename each file with your sample id in a proper way.
Example:
Your sample is called “AF-1CM” and you have two technical replicates.
Then, depending on the output software, the name of the data files
should be:
The last two characters (e.g. “_1”) refer to the technical replicate. If you don’t have technical replicates just remove these two last characters (so for example in the case of LipidSearch you should have “AF-1C-M_deuterated.txt” and “AF-1C-M_nonlabeled.txt”).
NOTE
In the choice of your sample name, it’s better to avoid special
characters and **DO NOT use underscores (_)**. This character is used by
ADViSELipidomics to split the file name into three parts: the sample
name, the type of file (deuterated/nonlabeled or positive/negative), and
the technical replicate as shown in the following picture:
datafile_names
For example, a bad name could be “Blood_bag_deuterated_1.txt”, while a good name is “Blood-bag_deuterated_1.txt”.
The Target File is an Excel file that contains all the information about your samples. It is the most important file since it is used for LipidSearch import, LIQUID import, and User’s Excel File import. This file requires some mandatory columns that have to be filled with some criteria:
In the picture below there is a Target File example where the mandatory columns are enlightened in yellow and the optional column in green. You can fill the Target File with any other informative column, just try to avoid special characters like \^$?*/|+() and whitespace. You can use - or _ instead of whitespace.
NOTE
If your Target File doesn’t contain at least one informative column
about the samples (e.g. Product, Model_type, etc.), you can’t perform
any exploratory or statistical analysis.
Screenshot (197)
The example target file can be downloaded from here:
In LipidSearch and LIQUID option, ADViSELipidomics requires also another Excel file here called Internal Reference File which contains the list of the Internal Standard lipids defined per class and adduct, upper/lower bounds for the number of carbon atoms, upper/lower bounds for the number of double bonds, nominal standard concentration, and upper/lower bounds for the concentration linearity in the calibration curves. This file has many mandatory columns that depend both on the external software (LipidSearch, LIQUID) and the presence of internal standards (only for LipidSearch).
The picture below shows an Internal Reference File example in the case of LipidSearch and the presence of Internal Standards. In yellow are the mandatory columns, and in green the columns needed only in the presence of Internal Standards.
Screenshot (199)
The Internal Reference File example for the LipidSearch with Internal Standards option can be downloaded from here:
Internal_Reference_file_LipidSearch_withIS.xlsx
The picture below shows an Internal Reference File example in the case of LIQUID. In yellow are the mandatory columns.
Screenshot (201)
The Internal Reference File example for the LIQUID option can be downloaded from here:
In the case of LipidSearch, if you have Internal Standard, you can choose to use them or not. In this case, you need to upload also some Calibration Files which are two Excel Files and the data files coming from LipidSearch (here called concentration files). The concentration files are the same .txt files described in chapter 2.1. Please, refer to that chapter if you need more information about how to rename the files. Be sure that all the concentration files are inside a folder and they aren’t mixed with the data files of chapter 2.1. Next, ADViSELipidomics, requires two Calibration Excel files, one for the Nonlabeled and the other for the Deuterated. They share the same structure:
The picture below shows an example of a Calibration Excel file for the deuterated.
Screenshot (203)
A toy example for the Calibration Deuterated and Calibration Nonlabeled Excel files can be downloaded here:
If you already have a matrix file containing the abundance for each lipid, you need just two Excel files: the Target File and the Data Matrix File. Here the Target File has only one mandatory column, the SampleID. The Data Matrix (.xlsx file) must have the list of the lipids in the first column, which must be called “Lipids”, and then the samples (or replicates) in the following columns, with the column names that is the same of the SampleID of the Target File. It’s not necessary that the matrix is full (i.e. without missing values) since after uploaded, it’s possible to filter and impute NAs. The picture below shows an example of the Data Matrix.
Screenshot (204)
NOTE The column names in the data matrix must follow
the same rules of the SampleID for every Target File. Check chapter
2.2.
ADViSELipidomics allows the user to load a SummarizedExperiment (SE) object, saved as a .rds file, already prepared or previously downloaded after running ADViSELipidomics. Since the required SE object has a complex structure, we do not recommend the user to upload a SE object that wasn’t downloaded from ADViSELipidomics. The idea behind this option was that the user can save the SE object after the preprocessing steps and performs the exploratory and statistical analysis in another moment.
In the case of Metabolomics Workbench, you don’t need to import anything, because ADViSELipidomics downloads a selected Metabolomics Workbench experiment and converts it into an SE object.
ADViSELipidomics has a graphical user interface (GUI) implemented using the shiny and golem R packages. It has five main sections: Home, Data Import & Preprocessing, SumExp Visualization, Exploratory Analysis, and Statistical Analysis. Each section is accessible from a sidemenu on the left.
The Home section contains general information about ADViSELipidomics like the citation, the link to the GitHub page, and the link to this manual. From the “Start!” button it is possible to go to the following section where the user can upload the lipidomic data.
This section allows to import and process lipidomic data from various sources. When you open this section for the first time after launch, a message box appears and asks you to write your name and your company. This information will be stored in the final output of ADViSELipidomics. By default, if you click on “Run”, the User will be “Name” and the Company will be “Company”. The picture below shows the Data Import & Preprocessing section (with the different parts enlightened with red rectangles).
workflow_withbox
Since the option LipidSearch output with Internal Standard (IS) has the largest number of required files and steps, here we provide a complete guide for this case. Anyway, this guide applies also to LipidSearch without IS and to LIQUID: in these cases, the only difference is that there isn’t the CALIBRATION module (chapter 3.2.2).
The first module is the IMPORTING & FILTERING module where the user can upload the Target File, the Internal Reference File, and the Data files that come from LipidSearch.
IMPORTING & FILTERING module
If the previous module is completed, the CALIBRATION module appears next to it. The Calibration module creates the calibration curves and the calibration matrix. It uses the Internal Lipid Standards reported in the Internal Reference file, and the correspondence between the Concentration Files and the lipid classes declared in the Calibration File. This module extracts the relationships between peaks area and concentration values for each internal lipid standard, constructing the calibration curves with a linear model and plotting them. The linear regression model can be classical or robust, with zero or non-zero intercept. Finally, the calibration matrix resumes all the points from the calibration curves. After the calibration process, ADViSELipidomics stores slope and intercept values for the recovery module. As already stated, this module appears only if you are using LipidSearch output with Internal Standard (and you clicked on “Yes” in the radiobutton that asks you “Do you have internal standards?”). In this module, you need two Calibration Files (.xlsx, see chapter 2.4) and the Concentration files coming from LipidSearch related to the internal standard (.txt, see chapter 2.1).
CALIBRATION module
This is the last module of the preprocessing menu where you can filter and impute missing values (NAs), build the SummarizedExperiment object, and download it.
MISSING DATA & SUMMARIZED EXPERIMENT
Once you ended successfully the Preprocessing module, the first thing that you can do is check the just created SummarizedExperiment (SE) object. It can be done in the SumExp Visualization menu. The complex structure of the SE object can be investigated by a red gear icon where you can choose what part of the SE object should be shown and summarise the data (if you have technical replicates).
SumExp Visualization
In the picture above, it is shown the rowData part of the SE object containing the annotation on lipids. Each lipid in the “Lipids” contains a hyperlink to the SwissLipids online repository to provide structural, biological, and analytic details.
The Exploratory Analysis menu includes three sub-menus: Plots, Clustering, and Dimensionality Reduction.
This sub-menu allows the user to create different types of plots to show the trend and behavior of data, exploring them from lipid and/or sample points of view. It has 4 panels: Lipids, Scatterplot, Heatmap, and Quality plots.
The picture below shows an example of the Lipid class proportion
plot.
The Clustering sub-menu allows the user to cluster the data by lipids or samples. The user can choose the number of clusters and the clustering method among the following algorithms: hierarchical clustering (using single, complete, Ward as linkage function) or partitioning clustering (k-means, PAM, Clara). If you choose a partitioning clustering, ADViSELipidomics performs first a PCA. Additional plots, such as the silhouette plot, can suggest the number of clusters to use.
Clustering
The Dimensionality Reduction sub-menu allows the user to choose between unsupervised (PCA) and supervised approaches (PLS-DA, sPLS-DA) to represent the data in a two or three-dimensional space. It contains three panels PCA, PLS-DA, and sPLS-DA.
Here’s an example of a 2D plot for the PCA.
The Statistical Analysis menu includes two sub-menus: Differential Analysis and Enrichment Analysis.
DA
The Differential Analysis sub-menu applies statistical algorithms to identify lipids with a different abundance among samples associated with experimental conditions (i.e., treatment versus control). It has two panels: Build DA and Comparisons. The first allows the user to build and run the differential analysis, while the second shows the “differential expressed” lipids with a Venn Diagram and an Upset plot.
The picture below shows the first panel, Buil DA with the different parts enlightened in red rectangles.
DA_box
After ADViSELipidomics performed the DA, you can go to the Comparisons panel to visualize the “differential expressed” lipids and perform pairwise comparisons between different contrasts using the Venn diagram and the Upset plot. Finally, it reports the list of common lipids in tabular form. These two plots are available only with at least two contrasts.
Venn
The Enrichment Analysis sub-menu allows for building different lipid sets from the chemical features of the lipids: i.e., lipid classes, total chain length (the sum of all carbon atoms in the tails), total unsaturation (the sum of all the double bonds in the tails). Then, after defining a ranking for the lipids (i.e., logarithmic Fold Change, p-value, adjusted p-value, or B statistic), it identifies enriched sets of lipids using a permutation test. To achieve a robust result, it was necessary to perform a few million permutations, hence this process may take a while.
Enrichment
If you want to test the software with a LipidSearch output, here we provide the files used in the Case Study #1 as described in the supplementary to our paper……inserire link.
Before use, please extract the files from the archive. Since this experiment uses internal standards, you can follow the example in chapter 3.2.